Caching for Realtime Search

نویسندگان

  • Edward Bortnikov
  • Ronny Lempel
  • Kolman Vornovitsky
چکیده

Modern search engines feature real-time indices, which incorporate changes to content within seconds. As search engines also cache search results for reducing user latency and back-end load, without careful real-time management of search results caches, the engine might return stale search results to users despite the efforts invested in keeping the underlying index up to date. A recent paper proposed an architectural component called CIP – the cache invalidation predictor. CIPs invalidate supposedly stale cache entries upon index modifications. Initial evaluation showed the ability to keep the performance benefits of caching without sacrificing much the freshness of search results returned to users. However, it was conducted on a synthetic workload in a simplified setting, using many assumptions. We propose new CIP heuristics, and evaluate them in an authentic environment – on the real evolving corpus and query stream of a large commercial news search engine. Our CIPs operate in conjunction with realistic cache settings, and we use standard metrics for evaluating cache performance. We show that a classical cache replacement policy, LRU, completely fails to guarantee freshness over time, whereas our CIPs serve 97% of the queries with fresh results. Our policies incur a negligible impact on the baseline’s cache hit rate, in contrast with traditional age-based invalidation, which must severely reduce the cache performance in order to achieve the same freshness. We demonstrate that the computational overhead of our algorithms is minor, and that they even allow reducing the cache’s memory footprint.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Caching for Realtime Multimedia Systems

Several cashing schemes for realtime multimedia systems have been proposed but they only concentrate on the hit ratio as the performance metric without providing any means to utilize the disk bandwidth gained through cache hits. A high cache hit ratio alone does not capture the eeectiveness of a caching scheme for such systems unless hiccup-free service is guaranteed. We evaluate three caching ...

متن کامل

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

The popularity of video-on-demand (VoD) streaming has grown dramatically over the World Wide Web. Most users in VoD P2P networks have to wait a long time in order to access their requesting videos. Therefore, reducing waiting time to access videos is the main challenge for VoD P2P networks. In this paper, we propose a novel algorithm for caching video based on peers' priority and video's popula...

متن کامل

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

The popularity of video-on-demand (VoD) streaming has grown dramatically over the World Wide Web. Most users in VoD P2P networks have to wait a long time in order to access their requesting videos. Therefore, reducing waiting time to access videos is the main challenge for VoD P2P networks. In this paper, we propose a novel algorithm for caching video based on peers' priority and video's popula...

متن کامل

Realtime Multimedia Caching Schemes

Several caching schemes for realtime multimedia systems have been proposed but they have only concentrated on the hit ratio as a performance metric without providing any means to utilize the disk bandwidth gained through cache hits. A high cache hit ratio alone does not capture the eeectiveness of a caching scheme for such systems unless hiccup-free service is guaranteed. We evaluate four cachi...

متن کامل

Caching schemes for DCOP search algorithms

Distributed Constraint Optimization (DCOP) is useful for solving agent-coordination problems. Any-space DCOP search algorithms require only a small amount of memory but can be sped up by caching information. However, their current caching schemes do not exploit the cached information when deciding which information to preempt from the cache when a new piece of information needs to be cached. Ou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011